Real-time parking hotspots
Authored by: Siju Deng
Duration: {90} mins
Level: {Intermediate}
Pre-requisite Skills: {Python, Machine Learning / Data Wrangling}
Scenario

As an ordinary living in Melbourne. I want to avoid the difficulty of parking in the city.

  • Where is it difficult to park now?
  • Is it peak parking time now?
What this use case will teach you

At the end of this use case you will:

  • Get to know the Melbourne city center parking map
  • Understand the current parking hotspots in Melbourne city center
  • Learn about multi-file data filtering
  • Deepen understanding of snapshot data
  • Understand the necessity of data processing
Limitations and Purpose
  • This notebook performs exploratory data analysis of Melbourne parking sensor data. Try combining Melbourne parking restriction data with Melbourne parking data. With exploratory analysis, attempts, and assumptions, it's not a rigorous conclusion.
  • Due to the snapshot nature of sensor data, the data within the notebook is a historical record of the time the notebook is running. This notebook can only be used as a reference and tutorial and does not have the function of real-time data. Please download and run if necessary. Limited by sensor failures and update issues. Data accuracy is not guaranteed.

Get and process data¶

In [8]:
# from google.colab import drive
import os
import requests
import json
import pytz
import folium
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
from datetime import datetime, timezone
from pandas.tseries.offsets import Minute
from sklearn.cluster import DBSCAN

Fetch Data from Open Data Melbourne¶

In [9]:
# drive.mount('/content/gdrive')

# with open('/content/gdrive/My Drive/Colab Notebooks/apiKey.txt', 'r') as file:
#     api_key = file.read().strip()

# api_key = os.getenv(api_key)
In [10]:
# export endpoint

# https://data.melbourne.vic.gov.au/explore/dataset/on-street-parking-bay-sensors/information/?dataChart=eyJxdWVyaWVzIjpbeyJjaGFydHMiOlt7InR5cGUiOiJjb2x1bW4iLCJmdW5jIjoiQ09VTlQiLCJ5QXhpcyI6ImJheV9pZCIsInNjaWVudGlmaWNEaXNwbGF5Ijp0cnVlLCJjb2xvciI6IiNFNTBFNTYifV0sInhBeGlzIjoibGFzdHVwZGF0ZWQiLCJtYXhwb2ludHMiOjUwLCJzb3J0IjoiIiwidGltZXNjYWxlIjoieWVhciIsImNvbmZpZyI6eyJkYXRhc2V0Ijoib24tc3RyZWV0LXBhcmtpbmctYmF5LXNlbnNvcnMiLCJvcHRpb25zIjp7fX19XSwidGltZXNjYWxlIjoiIiwiZGlzcGxheUxlZ2VuZCI6dHJ1ZSwiYWxpZ25Nb250aCI6dHJ1ZX0%3D
dataset_id = 'on-street-parking-bay-sensors'

base_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = api_key
dataset_id = dataset_id
format = 'csv'

url = f'{base_URL}{dataset_id}/exports/{format}'
params = {
    'select': '*',
    'limit': -1,  # all records
    'lang': 'en',
    'timezone': 'UTC',
    # 'api_key': apikey
}

# GET request
response = requests.get(url, params = params)

if response.status_code == 200:
    # StringIO to read the CSV data
    url_content = response.content.decode('utf-8')
    df = pd.read_csv(StringIO(url_content), delimiter = ';')
    print(df.sample(2, random_state = 999))    # test
else:
    print(f'Request failed with status code {response.status_code}')
                    lastupdated           status_timestamp  zone_number  \
2278  2024-05-11T03:30:26+00:00  2024-05-11T03:06:09+00:00       7740.0   
1566  2024-04-30T04:44:26+00:00  2024-04-15T00:56:50+00:00       7833.0   

     status_description  kerbsideid                                 location  
2278            Present       10742   -37.80379669737577, 144.95088211182127  
1566            Present       13610  -37.829771231755444, 144.96716983918282  
In [11]:
# check length in dataset
print(f'The dataset contains {len(df)} records.')
The dataset contains 6246 records.

Data preprocessing¶

Separate latitude and longitude in data

In [12]:
print(df.dtypes)
lastupdated            object
status_timestamp       object
zone_number           float64
status_description     object
kerbsideid              int64
location               object
dtype: object
In [13]:
# Split the location string into latitude and longitude, then assign to new columns
df['latitude'] = df['location'].apply(lambda x: float(x.split(',')[0].strip()))
df['longitude'] = df['location'].apply(lambda x: float(x.split(',')[1].strip()))

# Drop the original location column
df.drop(columns = ['location'], inplace=True)
In [14]:
print(df.head(3))
                 lastupdated           status_timestamp  zone_number  \
0  2023-10-25T01:44:02+00:00  2023-10-25T01:07:05+00:00       7556.0   
1  2023-10-25T01:44:02+00:00  2023-10-25T00:57:43+00:00       7556.0   
2  2023-10-25T01:44:02+00:00  2023-10-25T01:12:46+00:00       7556.0   

  status_description  kerbsideid   latitude   longitude  
0            Present        5730 -37.810582  144.968053  
1            Present        5728 -37.810566  144.968109  
2            Present        5750 -37.810558  144.968137  

Check null value and data type

In [15]:
def validate_data(df):
    print("\nChecking for NaN values:")
    print(df.isna().sum())
    print("\nChecking for Duplicates:")
    print(df.duplicated().sum())
    print("\nChecking data types:")
    print(df.dtypes)

validate_data(df)
Checking for NaN values:
lastupdated             0
status_timestamp        0
zone_number           487
status_description      0
kerbsideid              0
latitude                0
longitude               0
dtype: int64

Checking for Duplicates:
0

Checking data types:
lastupdated            object
status_timestamp       object
zone_number           float64
status_description     object
kerbsideid              int64
latitude              float64
longitude             float64
dtype: object

Since the zone number does not affect the geographical location coordinates and sensor ID, it is not processed. Check and eliminate abnormal time records below. Delete records older than one year.

In [16]:
# Define the function to convert columns to datetime
def convert_to_datetime(df, column_name):
    try:
        df[column_name] = pd.to_datetime(df[column_name], errors = 'coerce', format = "%Y-%m-%dT%H:%M:%S+00:00")
    except ValueError as e:
        print(f"Error converting {column_name}: {e}")
In [17]:
# Create a new DataFrame for analysis based on the original df
df_analysis = df.copy()

# Apply the datetime conversion function
convert_to_datetime(df_analysis, 'lastupdated')
convert_to_datetime(df_analysis, 'status_timestamp')

# Check for NaT values in datetime columns
nat_counts = df_analysis[['lastupdated', 'status_timestamp']].isna().sum()
print("NaT values after conversion:")
print(nat_counts)
NaT values after conversion:
lastupdated         0
status_timestamp    0
dtype: int64
In [18]:
# Identify the most recent timestamp
latest_timestamp = df_analysis['status_timestamp'].max()

# Calculate the date one year ago from the most recent timestamp
one_year_ago = latest_timestamp - pd.DateOffset(years=1)

# Filter and process as per your previous logic
outdated_rows = df_analysis[df_analysis['status_timestamp'] < one_year_ago].copy()

print("Outdated rows:")
print(outdated_rows.sort_values(by = 'status_timestamp').head())

# Print the data header sorted by 'status_timestamp'
print("\nData header sorted by time:")
print(df_analysis.sort_values(by = 'status_timestamp').head())
Outdated rows:
            lastupdated status_timestamp  zone_number status_description  \
477 2023-12-11 23:45:34       1970-01-01       7498.0         Unoccupied   
67  2023-11-22 00:44:42       1970-01-01          NaN         Unoccupied   
114 2023-11-22 00:44:42       1970-01-01       7531.0         Unoccupied   
110 2023-11-22 00:44:42       1970-01-01       7531.0         Unoccupied   
664 2023-11-22 00:44:42       1970-01-01          NaN         Unoccupied   

     kerbsideid   latitude   longitude  
477        7740 -37.812619  144.953622  
67         8040 -37.818000  144.954134  
114        8033 -37.817782  144.954895  
110        8027 -37.817671  144.955279  
664        6246 -37.817832  144.958757  

Data header sorted by time:
            lastupdated status_timestamp  zone_number status_description  \
664 2023-11-22 00:44:42       1970-01-01          NaN         Unoccupied   
110 2023-11-22 00:44:42       1970-01-01       7531.0         Unoccupied   
114 2023-11-22 00:44:42       1970-01-01       7531.0         Unoccupied   
67  2023-11-22 00:44:42       1970-01-01          NaN         Unoccupied   
477 2023-12-11 23:45:34       1970-01-01       7498.0         Unoccupied   

     kerbsideid   latitude   longitude  
664        6246 -37.817832  144.958757  
110        8027 -37.817671  144.955279  
114        8033 -37.817782  144.954895  
67         8040 -37.818000  144.954134  
477        7740 -37.812619  144.953622  
In [19]:
# Delete outdated rows
df_analysis = df_analysis[df_analysis['status_timestamp'] >= one_year_ago]
In [20]:
df_analysis.shape
Out[20]:
(6154, 7)

Based on geographical location coordinates and parking space status and id. Check for duplicate information and eliminate it. Since the retrieval adopts a circular application, duplicate data may be obtained. This step cannot be omitted.

In [21]:
# Detect duplicates based on all columns
duplicates = df_analysis.duplicated(
    subset = ['kerbsideid', 'status_description', 'latitude', 'longitude'],
    keep = False)

# Display duplicates
print("Duplicate Rows based on all columns are:")
print(df_analysis[duplicates].sort_values('kerbsideid'))
Duplicate Rows based on all columns are:
Empty DataFrame
Columns: [lastupdated, status_timestamp, zone_number, status_description, kerbsideid, latitude, longitude]
Index: []
In [22]:
# Remove duplicates based on specified columns in df_analysis
df_analysis.drop_duplicates(
    subset = ['kerbsideid', 'status_description',
              'latitude', 'longitude', 'lastupdated'],
    keep = 'first', inplace = True)

Data analysis¶

Processing¶

Get the total number of data

In [23]:
# Drop duplicate rows based on 'latitude' and 'longitude'
unique_locations = df_analysis.drop_duplicates(subset = ['latitude', 'longitude'])

# Count the number of unique parking spaces based on location
total_unique_parking_spaces = len(unique_locations)

print(f"The total number of unique parking spaces based on location is {total_unique_parking_spaces}.")
The total number of unique parking spaces based on location is 5800.

Get parking spaces with non-single parking status, based on geographical location.

In [24]:
# Group by latitude and longitude and count occurrences in df_analysis
location_counts = df_analysis.groupby(
    ['latitude', 'longitude']).size().reset_index(name = 'Occurrences')

# Identify locations that appear more than once in df_analysis
duplicate_locations = location_counts[location_counts['Occurrences'] > 1]

# Print out the duplicate locations in df_analysis
if duplicate_locations.empty:
    print("Each geographic location occurs only once.")
else:
    print("Duplicate geographic locations found:")
    print(duplicate_locations)
Duplicate geographic locations found:
       latitude   longitude  Occurrences
1031 -37.819612  144.954532            2
1046 -37.819562  144.954509            2
1058 -37.819512  144.954486            2
1066 -37.819463  144.954464            2
1133 -37.819000  144.958300            2
...         ...         ...          ...
3862 -37.809577  144.971496            2
3868 -37.809559  144.971561            2
3869 -37.809551  144.971585            2
3871 -37.809544  144.971611            2
3874 -37.809536  144.971637            2

[353 rows x 3 columns]

Get geolocation with multiple kerbsideid

In [25]:
# Filter rows in df_analysis where geographic locations are duplicated
duplicate_locations = df_analysis[
    df_analysis.duplicated(subset = ['latitude', 'longitude'], keep = False)]

# Sort these rows in df_analysis by latitude, longitude, and kerbsideid for easier inspection
sorted_duplicates = duplicate_locations.sort_values(
    by = ['latitude', 'longitude', 'kerbsideid'])

# Print kerbsideid for duplicate locations in df_analysis
print("Kerbside id for duplicate locations:")
print(sorted_duplicates[['latitude', 'longitude', 'kerbsideid']])
Kerbside id for duplicate locations:
       latitude   longitude  kerbsideid
149  -37.819612  144.954532        8057
3578 -37.819612  144.954532       57924
595  -37.819562  144.954509        8056
2062 -37.819562  144.954509       57925
176  -37.819512  144.954486        8055
...         ...         ...         ...
3476 -37.809551  144.971585       65038
219  -37.809544  144.971611        6086
3477 -37.809544  144.971611       65037
35   -37.809536  144.971637        6087
3478 -37.809536  144.971637       65036

[707 rows x 3 columns]

Check if there are duplicate kerbsideid

In [26]:
# df_analysis is DataFrame for analysis
duplicate_kerbsideid = df_analysis[df_analysis['kerbsideid'].duplicated(keep = False)]

# Sort the data in df_analysis for easier inspection
duplicate_kerbsideids = duplicate_kerbsideid.sort_values('kerbsideid')

if len(duplicate_kerbsideid) > 0:
    print("Duplicate kerbsideid found in df_analysis:")
    print(duplicate_kerbsideid[['kerbsideid', 'latitude', 'longitude']])
else:
    print("No duplicate kerbsideid found in df_analysis.")
No duplicate kerbsideid found in df_analysis.

Explore the earliest time each parking space is empty

In [27]:
# Filter for 'Unoccupied' status in df_analysis
unoccupied_spaces = df_analysis[df_analysis['status_description'] == 'Unoccupied']

# Sort by 'status_timestamp' in df_analysis
sorted_unoccupied = unoccupied_spaces.sort_values(by = 'status_timestamp')

# Find the earliest time for each unique parking space based on latitude and longitude in df_analysis
earliest_unoccupied = sorted_unoccupied.drop_duplicates(
    subset = ['latitude', 'longitude'], keep = 'first')

# Display the result
print("The earliest time each parking space was detected as empty in df_analysis:")
print(earliest_unoccupied[['latitude', 'longitude', 'status_timestamp']])
The earliest time each parking space was detected as empty in df_analysis:
       latitude   longitude    status_timestamp
5257 -37.806193  144.952080 2023-05-27 16:40:41
3822 -37.806147  144.952089 2023-06-04 06:46:46
5256 -37.806170  144.952085 2023-06-05 07:35:13
3819 -37.806027  144.952110 2023-06-07 00:41:04
2303 -37.805884  144.952136 2023-06-07 01:13:53
...         ...         ...                 ...
4567 -37.812266  144.970216 2024-05-11 03:28:40
2001 -37.814475  144.974604 2024-05-11 03:28:41
6169 -37.815803  144.972748 2024-05-11 03:28:43
6238 -37.814150  144.960434 2024-05-11 03:28:43
3168 -37.813994  144.977744 2024-05-11 03:28:51

[1938 rows x 3 columns]

Check whether the occupied parking space has previously reported a vacant status.

In [28]:
# Convert to datetime format
df_analysis['lastupdated'] = pd.to_datetime(df_analysis['lastupdated'])
df_analysis['status_timestamp'] = pd.to_datetime(df_analysis['status_timestamp'])

# Sort and filter
sorted_df = df_analysis.sort_values(by = ['latitude', 'longitude', 'status_timestamp'])
current_vacant = sorted_df[sorted_df['status_description'] == 'Unoccupied'].copy()

# Initialize result DataFrame
result = pd.DataFrame(columns = ['latitude', 'longitude', 'last_occupied_time'])

# Loop through each vacant space to check its history
for idx, row in current_vacant.iterrows():
    lat, lon, current_time = row['latitude'], row['longitude'], row['status_timestamp']
    time_threshold = current_time - pd.Timedelta(hours = 3)
    previous_records = sorted_df[
        (sorted_df['latitude'] == lat) &
         (sorted_df['longitude'] == lon) &
          (sorted_df['status_timestamp'] >= time_threshold) &
           (sorted_df['status_description'] == 'Present')]

    if not previous_records.empty:
        last_occupied_time = previous_records['status_timestamp'].max()
        new_row = pd.DataFrame(
            {'latitude': [lat], 'longitude': [lon],
             'last_occupied_time': [last_occupied_time]})
        result = pd.concat([result, new_row], ignore_index = True)

# Display the result
print("Parking spaces that were occupied a few hours ago and are now vacant:")
print(result)
Parking spaces that were occupied a few hours ago and are now vacant:
     latitude   longitude  last_occupied_time
0  -37.818448  144.959387 2024-03-13 03:03:31
1  -37.818156  144.957058 2024-05-11 01:52:59
2  -37.818018  144.954073 2024-05-10 20:33:28
3  -37.817982  144.954198 2024-04-03 22:49:13
4  -37.817890  144.958554 2024-05-11 02:09:09
..        ...         ...                 ...
60 -37.809800  144.970405 2024-05-11 02:40:45
61 -37.809787  144.972304 2024-05-11 03:06:17
62 -37.809599  144.971422 2024-05-11 03:06:00
63 -37.809577  144.971496 2024-05-11 02:18:32
64 -37.809544  144.971611 2024-05-11 00:56:13

[65 rows x 3 columns]

Analyze results¶

  • It can be found that very few sensors report both statuses. At the same time, some sensors have two parking space ids. Each parking space ID has only one status, and there are no duplicate IDs. It may be due to the nature of the sensor. Because the geographical location is the sensor location, sometimes one sensor can manage two parking spaces.

  • It can be determined that this data set is snapshot attribute data. When a sensor reports a state change, the old state is deleted from the data. Also, referring to the official documentation, each sensor will be tested every morning. Today's data is the only reliable data. Other sensors cannot be used due to construction closures and other reasons.

Data processing¶

Convert time zone from UTC to Melbourne time

In [29]:
# Create a timezone object for Melbourne
melbourne_tz = pytz.timezone('Australia/Melbourne')

# Get the current time in UTC
current_time_utc = datetime.now(pytz.utc)

# Convert to Melbourne local time
current_time_melbourne = current_time_utc.astimezone(melbourne_tz)

print(f"Current time in UTC: {current_time_utc}")
print(f"Current time in Melbourne: {current_time_melbourne}")
Current time in UTC: 2024-05-11 03:40:22.985981+00:00
Current time in Melbourne: 2024-05-11 13:40:22.985981+10:00
In [30]:
# Check if 'status_timestamp' is timezone-naive
if df_analysis['status_timestamp'].dt.tz is None:
    # Localize naive datetime objects to UTC
    df_analysis['status_timestamp'] = df_analysis['status_timestamp'].dt.tz_localize('UTC')
else:
    # If already timezone-aware, ensure it's in UTC
    df_analysis['status_timestamp'] = df_analysis['status_timestamp'].dt.tz_convert('UTC')

# Convert 'status_timestamp' from UTC to Melbourne local time
df_analysis['status_timestamp'] = df_analysis['status_timestamp'].dt.tz_convert('Australia/Melbourne')

# Display the first few 'status_timestamp' to check
print(df_analysis['status_timestamp'].head())
0   2023-10-25 12:07:05+11:00
1   2023-10-25 11:57:43+11:00
2   2023-10-25 12:12:46+11:00
3   2023-10-25 06:31:04+11:00
4   2023-10-25 10:56:41+11:00
Name: status_timestamp, dtype: datetime64[ns, Australia/Melbourne]

Get today’s data volume

In [31]:
print("Unique dates in the dataset:", df_analysis['status_timestamp'].dt.date.unique())

# Get today's date from the 'current_time_melbourne' which is already timezone-aware
today_melbourne = current_time_melbourne.date()
print("Today's date in Melbourne:", today_melbourne)

# Convert 'status_timestamp' to date only for filtering
df_analysis['status_date'] = df_analysis['status_timestamp'].dt.date

# Filter for today's data
todays_data = df_analysis[df_analysis['status_date'] == today_melbourne]

# Count the number of rows in today's data
num_rows_today = len(todays_data)
print(f"There are {num_rows_today} rows in today's data.")
Unique dates in the dataset: [datetime.date(2023, 10, 25) datetime.date(2023, 10, 18)
 datetime.date(2023, 10, 24) datetime.date(2023, 9, 7)
 datetime.date(2023, 8, 12) datetime.date(2023, 11, 16)
 datetime.date(2023, 11, 22) datetime.date(2023, 6, 20)
 datetime.date(2023, 11, 17) datetime.date(2023, 11, 20)
 datetime.date(2023, 11, 21) datetime.date(2023, 11, 13)
 datetime.date(2023, 11, 11) datetime.date(2023, 11, 14)
 datetime.date(2023, 11, 9) datetime.date(2023, 11, 8)
 datetime.date(2023, 11, 15) datetime.date(2023, 10, 27)
 datetime.date(2023, 12, 5) datetime.date(2023, 11, 3)
 datetime.date(2023, 11, 1) datetime.date(2023, 11, 26)
 datetime.date(2023, 12, 4) datetime.date(2023, 9, 22)
 datetime.date(2023, 7, 5) datetime.date(2023, 11, 6)
 datetime.date(2023, 9, 14) datetime.date(2023, 10, 16)
 datetime.date(2023, 8, 16) datetime.date(2023, 8, 14)
 datetime.date(2023, 8, 8) datetime.date(2023, 10, 19)
 datetime.date(2023, 10, 11) datetime.date(2023, 8, 4)
 datetime.date(2023, 9, 6) datetime.date(2023, 8, 11)
 datetime.date(2023, 9, 27) datetime.date(2023, 7, 4)
 datetime.date(2023, 10, 17) datetime.date(2024, 3, 15)
 datetime.date(2023, 8, 17) datetime.date(2023, 12, 14)
 datetime.date(2023, 12, 13) datetime.date(2023, 12, 15)
 datetime.date(2023, 6, 12) datetime.date(2023, 12, 6)
 datetime.date(2023, 12, 7) datetime.date(2023, 10, 26)
 datetime.date(2023, 10, 4) datetime.date(2023, 8, 2)
 datetime.date(2023, 11, 12) datetime.date(2023, 9, 28)
 datetime.date(2023, 8, 5) datetime.date(2023, 10, 30)
 datetime.date(2023, 11, 29) datetime.date(2023, 12, 8)
 datetime.date(2023, 11, 30) datetime.date(2023, 11, 28)
 datetime.date(2023, 12, 18) datetime.date(2023, 11, 2)
 datetime.date(2023, 8, 29) datetime.date(2023, 10, 3)
 datetime.date(2023, 8, 15) datetime.date(2023, 12, 2)
 datetime.date(2023, 10, 15) datetime.date(2023, 10, 28)
 datetime.date(2023, 9, 1) datetime.date(2023, 10, 31)
 datetime.date(2023, 12, 12) datetime.date(2023, 10, 5)
 datetime.date(2023, 12, 20) datetime.date(2023, 12, 17)
 datetime.date(2023, 10, 6) datetime.date(2024, 1, 20)
 datetime.date(2024, 1, 13) datetime.date(2023, 12, 28)
 datetime.date(2024, 1, 22) datetime.date(2024, 1, 18)
 datetime.date(2024, 1, 19) datetime.date(2024, 1, 21)
 datetime.date(2024, 1, 23) datetime.date(2024, 1, 17)
 datetime.date(2023, 9, 24) datetime.date(2024, 1, 31)
 datetime.date(2024, 1, 30) datetime.date(2024, 2, 7)
 datetime.date(2024, 2, 8) datetime.date(2024, 2, 5)
 datetime.date(2024, 1, 27) datetime.date(2024, 1, 29)
 datetime.date(2024, 1, 2) datetime.date(2024, 2, 6)
 datetime.date(2024, 2, 14) datetime.date(2024, 2, 15)
 datetime.date(2024, 2, 16) datetime.date(2024, 2, 4)
 datetime.date(2024, 2, 22) datetime.date(2024, 2, 19)
 datetime.date(2024, 2, 18) datetime.date(2023, 8, 28)
 datetime.date(2024, 2, 29) datetime.date(2024, 1, 12)
 datetime.date(2024, 1, 6) datetime.date(2024, 3, 2)
 datetime.date(2023, 12, 16) datetime.date(2024, 2, 21)
 datetime.date(2024, 1, 24) datetime.date(2024, 2, 26)
 datetime.date(2024, 3, 4) datetime.date(2024, 3, 6)
 datetime.date(2023, 12, 22) datetime.date(2024, 1, 28)
 datetime.date(2023, 12, 31) datetime.date(2024, 2, 20)
 datetime.date(2024, 3, 16) datetime.date(2024, 3, 13)
 datetime.date(2024, 3, 14) datetime.date(2024, 3, 25)
 datetime.date(2024, 3, 17) datetime.date(2024, 3, 18)
 datetime.date(2024, 4, 5) datetime.date(2024, 4, 4)
 datetime.date(2024, 3, 8) datetime.date(2024, 3, 21)
 datetime.date(2024, 3, 22) datetime.date(2024, 4, 11)
 datetime.date(2024, 4, 10) datetime.date(2024, 3, 19)
 datetime.date(2024, 2, 9) datetime.date(2023, 12, 9)
 datetime.date(2024, 3, 1) datetime.date(2024, 1, 26)
 datetime.date(2024, 4, 8) datetime.date(2024, 2, 23)
 datetime.date(2024, 2, 12) datetime.date(2024, 4, 9)
 datetime.date(2024, 4, 1) datetime.date(2024, 3, 23)
 datetime.date(2024, 4, 30) datetime.date(2024, 4, 29)
 datetime.date(2024, 4, 24) datetime.date(2024, 4, 13)
 datetime.date(2024, 4, 17) datetime.date(2024, 4, 16)
 datetime.date(2023, 6, 13) datetime.date(2024, 4, 15)
 datetime.date(2024, 4, 12) datetime.date(2024, 4, 19)
 datetime.date(2024, 4, 18) datetime.date(2024, 4, 21)
 datetime.date(2024, 4, 7) datetime.date(2024, 4, 22)
 datetime.date(2024, 4, 23) datetime.date(2024, 5, 2)
 datetime.date(2024, 5, 3) datetime.date(2024, 5, 6)
 datetime.date(2024, 3, 27) datetime.date(2024, 3, 31)
 datetime.date(2024, 4, 2) datetime.date(2024, 5, 11)
 datetime.date(2024, 5, 9) datetime.date(2024, 5, 8)
 datetime.date(2024, 5, 10) datetime.date(2023, 12, 1)
 datetime.date(2024, 5, 7) datetime.date(2024, 5, 5)
 datetime.date(2023, 11, 23) datetime.date(2024, 2, 17)
 datetime.date(2023, 9, 13) datetime.date(2024, 3, 26)
 datetime.date(2023, 8, 23) datetime.date(2023, 6, 8)
 datetime.date(2023, 6, 7) datetime.date(2024, 2, 25)
 datetime.date(2024, 4, 28) datetime.date(2023, 8, 24)
 datetime.date(2023, 11, 27) datetime.date(2024, 3, 5)
 datetime.date(2024, 5, 1) datetime.date(2024, 3, 20)
 datetime.date(2024, 4, 27) datetime.date(2024, 5, 4)
 datetime.date(2024, 2, 2) datetime.date(2023, 9, 25)
 datetime.date(2024, 4, 3) datetime.date(2023, 10, 10)
 datetime.date(2023, 8, 27) datetime.date(2023, 6, 4)
 datetime.date(2024, 3, 7) datetime.date(2024, 3, 28)
 datetime.date(2023, 10, 8) datetime.date(2024, 4, 20)
 datetime.date(2024, 2, 1) datetime.date(2024, 3, 10)
 datetime.date(2024, 4, 26) datetime.date(2023, 6, 5)
 datetime.date(2023, 5, 28) datetime.date(2023, 8, 22)
 datetime.date(2023, 8, 21) datetime.date(2024, 1, 15)
 datetime.date(2024, 3, 29)]
Today's date in Melbourne: 2024-05-11
There are 3769 rows in today's data.

Get today's data time range

In [32]:
if not todays_data.empty:
    min_timestamp_today = todays_data['status_timestamp'].min()
    max_timestamp_today = todays_data['status_timestamp'].max()
    print(f"The data for today ranges from {min_timestamp_today} to {max_timestamp_today}.")
else:
    print("No data available for today's date to determine the time range.")
The data for today ranges from 2024-05-11 00:02:02+10:00 to 2024-05-11 13:28:51+10:00.

Check if there are parking spaces with multiple statuses today

In [33]:
if todays_data.empty:
    print("No data available for today's analysis.")
else:
    # Group by unique identifier for each parking space and count unique statuses
    status_counts = todays_data.groupby(
        ['latitude', 'longitude'])['status_description'].nunique().reset_index(
            name = 'Unique_Status_Count')

    # Filter for parking spaces with more than one unique status
    multiple_statuses = status_counts[status_counts['Unique_Status_Count'] > 1]

    # Display the result
    if multiple_statuses.empty:
        print("No parking spaces with multiple statuses found for today.")
    else:
        print("Parking spaces with multiple statuses found for today:")
        print(multiple_statuses)
No parking spaces with multiple statuses found for today.

Today's data analysis & Visualization¶

Sensor visualization of all parking spaces in Melbourne city centre.

In [34]:
# Initialize the map centered around a latitude and longitude
m = folium.Map(location = [-37.814, 144.963], zoom_start = 16)  # 16 is the zoom level

# Add points to the map using df_analysis
for idx, row in todays_data.iterrows():
    color = "blue" if row['status_description'] == "Unoccupied" else "red"
    popup_text = f"Status: {row['status_description']}<br>Updated: {row['status_timestamp']}"
    folium.CircleMarker(
        location = [row['latitude'], row['longitude']],
        radius = 1,
        color = color,
        fill = True,
        fill_color = color,
        popup = popup_text
    ).add_to(m)

# Show the map
m
Out[34]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Analyze data time distribution

In [35]:
if todays_data.empty:
    print("No data available for today. Cannot perform update time analysis.")
else:
    # Extract the hour from the 'status_timestamp' and create a new column 'update_hour'
    todays_data['update_hour'] = todays_data['status_timestamp'].dt.hour

    # Create a frequency table for the update times
    frequency_table = todays_data['update_hour'].value_counts().sort_index()
    print("Frequency table for update times:")
    print(frequency_table)

    # If there are no update hours, inform the user
    if frequency_table.empty:
        print("No updates have been recorded today.")
    else:
        # Plot a histogram for the update times
        plt.figure(figsize = (10, 6))
        plt.hist(todays_data['update_hour'], bins = 24, range = (0, 23), edgecolor = 'black', alpha = 0.7)
        plt.xticks(range(24))
        plt.xlabel('Hour of the Day')
        plt.ylabel('Frequency')
        plt.title('Distribution of Update Times Today')
        plt.grid(True)
        plt.show()
<ipython-input-35-adf6f6a977c7>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['update_hour'] = todays_data['status_timestamp'].dt.hour
Frequency table for update times:
update_hour
0       12
1       10
2        5
3       11
4        6
5       11
6       58
7       72
8       51
9      119
10     253
11     572
12    1347
13    1242
Name: count, dtype: int64
No description has been provided for this image

Split into vacant and occupied status to visualize time distribution

In [37]:
# ensure 'todays_data' contains data
if todays_data.empty:
    print("No data available for today. Cannot perform update time analysis.")
else:
    # Filter data for 'Present' and 'Unoccupied' states
    present_data = todays_data[todays_data['status_description'] == 'Present']
    unoccupied_data = todays_data[todays_data['status_description'] == 'Unoccupied']

    # Create subplots
    fig, axes = plt.subplots(1, 2, figsize = (12, 6))

    # Find the maximum frequency to set the same y-axis limit for both subplots
    # Check if either DataFrame is empty to avoid NaN in max_frequency
    max_freq_present = present_data['update_hour'].value_counts().max() if not present_data.empty else 0
    max_freq_unoccupied = unoccupied_data['update_hour'].value_counts().max() if not unoccupied_data.empty else 0
    max_frequency = max(max_freq_present, max_freq_unoccupied)

    # Plot histogram for 'Present' state if data is available
    if not present_data.empty:
        axes[0].hist(present_data['update_hour'], bins = 24, edgecolor = 'black', range = (0, 23))
        axes[0].set_title('Distribution of Update Times for Present Spaces')
        axes[0].set_xlabel('Hour of the Day')
        axes[0].set_ylabel('Frequency')
        axes[0].set_ylim([0, max_frequency + 1])  # +1 to ensure visibility if max_frequency is 0
    else:
        axes[0].text(0.5, 0.5, 'No data for Present Spaces', horizontalalignment = 'center', verticalalignment = 'center')
        axes[0].set_title('Distribution of Update Times for Present Spaces')
        axes[0].set_xlabel('Hour of the Day')
        axes[0].set_ylabel('Frequency')

    # Plot histogram for 'Unoccupied' state if data is available
    if not unoccupied_data.empty:
        axes[1].hist(unoccupied_data['update_hour'], bins = 24, edgecolor = 'black', range = (0, 23))
        axes[1].set_title('Distribution of Update Times for Unoccupied Spaces')
        axes[1].set_xlabel('Hour of the Day')
        axes[1].set_ylabel('Frequency')
        axes[1].set_ylim([0, max_frequency + 1])  # +1 to ensure visibility if max_frequency is 0
    else:
        axes[1].text(0.5, 0.5, 'No data for Unoccupied Spaces', horizontalalignment = 'center', verticalalignment = 'center')
        axes[1].set_title('Distribution of Update Times for Unoccupied Spaces')
        axes[1].set_xlabel('Hour of the Day')
        axes[1].set_ylabel('Frequency')

    # Show the plots
    plt.tight_layout()
    plt.show()
No description has been provided for this image

Find parking restrictions¶

A parking space consists of three parts: parking space, parking space description, and sensors. Different departments manage the three and need to call different data sets. By comparing the parking space sensor and the bay's geographical location, the bayid of each geographical location is obtained. Then, get the description of the parking space through bayid.

Get the kerbside id of the parking space¶

Fetch the whole dataset from City of Melbourne Open Portal.

In [38]:
# export endpoint

# https://data.melbourne.vic.gov.au/explore/dataset/on-street-parking-bays/table/
dataset2_id = 'on-street-parking-bays'

base_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = api_key
dataset2_id = dataset2_id
format = 'csv'

url = f'{base_URL}{dataset2_id}/exports/{format}'
params = {
    'select': '*',
    'limit': -1,  # all records
    'lang': 'en',
    'timezone': 'UTC',
    # 'api_key': apikey
}

# GET request
response = requests.get(url, params = params)

if response.status_code == 200:
    # StringIO to read the CSV data
    url_content = response.content.decode('utf-8')
    bay_df = pd.read_csv(StringIO(url_content), delimiter = ';')
    print(bay_df.sample(2, random_state = 999))    # test
else:
    print(f'Request failed with status code {response.status_code}')
      roadsegmentid kerbsideid  \
153           20992       9609   
9672          22090        NaN   

                                 roadsegmentdescription   latitude  \
153   Errol Street between Victoria Street and Queen... -37.803762   
9672  Wadey Street between St Kilda Road and Wells S... -37.826066   

       longitude lastupdated  
153   144.949407  2022-08-31  
9672  144.970160  2022-08-31  
In [39]:
# check length in dataset
print(f'The dataset contains {len(bay_df)} records.')
The dataset contains 19162 records.

Integrate and add road segment id and description to parking sensor dataframe

In [40]:
exclude = ['latitude', 'longitude', 'lastupdated']
bay_selected = bay_df.drop(exclude, axis = 1)
In [42]:
# filtered records to include non-numeric values only
filtered_bay = bay_selected[bay_selected['kerbsideid'].str.isnumeric() != False].copy()

filtered_bay['kerbsideid'] = pd.to_numeric(
    filtered_bay['kerbsideid'], errors = 'coerce').astype('Int64')
In [43]:
# Perform a left merge
merged_df = pd.merge(df_analysis, filtered_bay, on = ['kerbsideid'], how = 'left')
print(merged_df.head(2))
          lastupdated          status_timestamp  zone_number  \
0 2023-10-25 01:44:02 2023-10-25 12:07:05+11:00       7556.0   
1 2023-10-25 01:44:02 2023-10-25 11:57:43+11:00       7556.0   

  status_description  kerbsideid   latitude   longitude status_date  \
0            Present        5730 -37.810582  144.968053  2023-10-25   
1            Present        5728 -37.810566  144.968109  2023-10-25   

   roadsegmentid                             roadsegmentdescription  
0        20003.0  Lonsdale Street between Exhibition Street and ...  
1        20003.0  Lonsdale Street between Exhibition Street and ...  
In [44]:
merged_df.shape
Out[44]:
(6158, 10)
In [45]:
# Check duplicated rows based on specific columns
kerb_dup = merged_df.duplicated(subset = ['kerbsideid'], keep = False)

# Display the duplicated rows based on 'kerbsideid'
merged_df[kerb_dup]
Out[45]:
lastupdated status_timestamp zone_number status_description kerbsideid latitude longitude status_date roadsegmentid roadsegmentdescription
416 2023-12-05 00:44:41 2023-12-05 09:15:56+11:00 7644.0 Present 18002 -37.817446 144.953536 2023-12-05 20073.0 Spencer Street between Little Collins Street a...
417 2023-12-05 00:44:41 2023-12-05 09:15:56+11:00 7644.0 Present 18002 -37.817446 144.953536 2023-12-05 20073.0 Spencer Street between Little Collins Street a...
418 2023-12-05 00:44:41 2023-10-11 20:27:19+11:00 7333.0 Present 17806 -37.815782 144.961823 2023-10-11 20099.0 Little Collins Street between Elizabeth Street...
419 2023-12-05 00:44:41 2023-10-11 20:27:19+11:00 7333.0 Present 17806 -37.815782 144.961823 2023-10-11 20099.0 Little Collins Street between Elizabeth Street...
423 2023-12-05 00:44:41 2023-12-04 05:16:37+11:00 7333.0 Present 17807 -37.815764 144.961886 2023-12-04 20099.0 Little Collins Street between Elizabeth Street...
424 2023-12-05 00:44:41 2023-12-04 05:16:37+11:00 7333.0 Present 17807 -37.815764 144.961886 2023-12-04 20099.0 Little Collins Street between Elizabeth Street...
5924 2024-05-11 03:30:26 2024-05-11 11:31:01+10:00 7269.0 Present 24986 -37.811248 144.975538 2024-05-11 21852.0 Parliament Place between St Andrews Place and ...
5925 2024-05-11 03:30:26 2024-05-11 11:31:01+10:00 7269.0 Present 24986 -37.811248 144.975538 2024-05-11 20073.0 Spencer Street between Little Collins Street a...
In [46]:
# Drop duplicated rows based on 'kerbsideid'
merged_df = merged_df.drop_duplicates(subset=['kerbsideid'], keep='first')
print(merged_df.shape)
(6154, 10)

Find out how many empty kerbsideid there are in total.

In [47]:
# Count the number of NaN values in the 'kerbsideid' column
num_nan = merged_df['kerbsideid'].isna().sum()

print(f"There are {num_nan} NaN values in the 'kerbsideid' column.")
There are 0 NaN values in the 'kerbsideid' column.

Find out how many empty kerbsideid there are in today’s data

In [48]:
# Filter to only include today's data
todays_data = merged_df[merged_df['status_date'] == today_melbourne]

# Count the number of NaN values in the 'kerbsideid' column for today's data
num_nan_today = todays_data['kerbsideid'].isna().sum()

print(f"There are {num_nan_today} NaN values in the 'kerbsideid' column for today's data.")
There are 0 NaN values in the 'kerbsideid' column for today's data.

Get parking space description¶

In [49]:
# export endpoint

# https://data.melbourne.vic.gov.au/explore/dataset/on-street-car-park-bay-restrictions/table/
dataset3_id = 'on-street-car-park-bay-restrictions'

base_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = api_key
dataset3_id = dataset3_id
format = 'csv'

url = f'{base_URL}{dataset3_id}/exports/{format}'
params = {
    'select': '*',
    'limit': -1,  # all records
    'lang': 'en',
    'timezone': 'UTC',
    # 'api_key': apikey
}

# GET request
response = requests.get(url, params = params)

if response.status_code == 200:
    # StringIO to read the CSV data
    url_content = response.content.decode('utf-8')
    restrictions = pd.read_csv(StringIO(url_content), delimiter = ';')
    print(restrictions.sample(2, random_state = 999))    # test
else:
    print(f'Request failed with status code {response.status_code}')
      bayid  deviceid             description1       description2  \
540    6151     30237  2P MTR M-SAT 7:30-18:30  2P SUN 7:30-18:30   
2353   6759     23157  2P MTR M-SAT 7:30-18:30  2P SUN 7:30-18:30   

     description3 description4 description5 description6  disabilityext1  \
540           NaN          NaN          NaN          NaN             240   
2353          NaN          NaN          NaN          NaN             240   

      disabilityext2  ...  today3  today4  today5  today6  typedesc1  \
540            240.0  ...     NaN     NaN     NaN     NaN   2P Meter   
2353           240.0  ...     NaN     NaN     NaN     NaN   2P Meter   

      typedesc2  typedesc3  typedesc4  typedesc5  typedesc6  
540          2P        NaN        NaN        NaN        NaN  
2353         2P        NaN        NaN        NaN        NaN  

[2 rows x 62 columns]
In [50]:
# check length in dataset 3

print(f'The dataset contains {len(restrictions)} records.')
The dataset contains 4263 records.

Get the columns description1-5 about the parking space restriction description. After online inspection on On-street Car Park Bay Restrictions, although description6 is marked as a time limit description, it is actually marked as January 1, AD1. There is no data description at the same time, so it is excluded manually.

In [51]:
# rename the 'bayid' column as'kerbsideid'
restrictions.rename(columns = {'bayid': 'kerbsideid'}, inplace = True)
In [52]:
# Merge merged_df with only the relevant columns from restrictions
columns_to_merge = ['kerbsideid', 'description1', 'description2',
                    'description3', 'description4', 'description5']
merged_df = pd.merge(
    merged_df,
    restrictions[columns_to_merge],
    left_on = 'kerbsideid',
    right_on = 'kerbsideid',
    how = 'left')
In [53]:
print(merged_df.head(2))
          lastupdated          status_timestamp  zone_number  \
0 2023-10-25 01:44:02 2023-10-25 12:07:05+11:00       7556.0   
1 2023-10-25 01:44:02 2023-10-25 11:57:43+11:00       7556.0   

  status_description  kerbsideid   latitude   longitude status_date  \
0            Present        5730 -37.810582  144.968053  2023-10-25   
1            Present        5728 -37.810566  144.968109  2023-10-25   

   roadsegmentid                             roadsegmentdescription  \
0        20003.0  Lonsdale Street between Exhibition Street and ...   
1        20003.0  Lonsdale Street between Exhibition Street and ...   

              description1       description2 description3 description4  \
0  2P MTR M-SAT 7:30-18:30  2P SUN 7:30-18:30          NaN          NaN   
1                      NaN                NaN          NaN          NaN   

  description5  
0          NaN  
1          NaN  

Add time limits to each parking space

In [54]:
# Columns to check for "DIS"
columns_to_check = ['description1', 'description2', 'description3', 'description4', 'description5']

# Find rows where any of the specified columns contain "DIS"
contains_dis = merged_df[columns_to_check].apply(lambda x: x.str.contains('DIS')).any(axis = 1)

# Extract rows with "DIS" into a separate DataFrame
disabled_parking_df = merged_df[contains_dis]

# Delete rows with "DIS" from the original DataFrame
merged_df = merged_df[~contains_dis]

# Display the new DataFrames
print("Disabled Parking Data:")
print(disabled_parking_df.head())

print("Remaining Data:")
print(merged_df.head())
Disabled Parking Data:
            lastupdated          status_timestamp  zone_number  \
31  2023-10-25 01:44:02 2023-10-25 12:03:28+11:00       7572.0   
35  2023-10-25 01:44:02 2023-10-25 12:23:52+11:00       7572.0   
55  2023-11-22 00:44:42 2023-11-22 11:07:08+11:00          NaN   
122 2023-12-05 00:44:41 2023-10-25 11:19:04+11:00       7570.0   
196 2023-10-25 01:44:02 2023-10-25 12:40:25+11:00       7572.0   

    status_description  kerbsideid   latitude   longitude status_date  \
31             Present        6081 -37.809592  144.971447  2023-10-25   
35             Present        6087 -37.809536  144.971637  2023-10-25   
55             Present        6237 -37.817866  144.958069  2023-11-22   
122            Present        6035 -37.809817  144.970344  2023-10-25   
196         Unoccupied        6080 -37.809599  144.971422  2023-10-25   

     roadsegmentid                             roadsegmentdescription  \
31         20001.0  Lonsdale Street between Spring Street and Exhi...   
35         20001.0  Lonsdale Street between Spring Street and Exhi...   
55         20118.0  Collins Street between William Street and King...   
122        20001.0  Lonsdale Street between Spring Street and Exhi...   
196        20001.0  Lonsdale Street between Spring Street and Exhi...   

                  description1            description2  \
31     2P DIS M-SAT 7:30-18:30                     NaN   
35   P DIS Sun 00:00 - 11:59PM  P DIS M-Sat 6.30-11.59   
55     2P DIS M-SUN 0:00-23:59                     NaN   
122    2P DIS M-SAT 7:30-18:30                     NaN   
196    2P DIS M-SAT 7:30-18:30                     NaN   

                description3 description4 description5  
31                       NaN          NaN          NaN  
35   2P DIS M-SAT 7:30-18:30          NaN          NaN  
55                       NaN          NaN          NaN  
122                      NaN          NaN          NaN  
196                      NaN          NaN          NaN  
Remaining Data:
          lastupdated          status_timestamp  zone_number  \
0 2023-10-25 01:44:02 2023-10-25 12:07:05+11:00       7556.0   
1 2023-10-25 01:44:02 2023-10-25 11:57:43+11:00       7556.0   
2 2023-10-25 01:44:02 2023-10-25 12:12:46+11:00       7556.0   
3 2023-10-25 01:44:02 2023-10-25 06:31:04+11:00       7556.0   
4 2023-10-25 01:44:02 2023-10-25 10:56:41+11:00       7556.0   

  status_description  kerbsideid   latitude   longitude status_date  \
0            Present        5730 -37.810582  144.968053  2023-10-25   
1            Present        5728 -37.810566  144.968109  2023-10-25   
2            Present        5750 -37.810558  144.968137  2023-10-25   
3            Present        5743 -37.810321  144.968947  2023-10-25   
4            Present        5749 -37.810286  144.969062  2023-10-25   

   roadsegmentid                             roadsegmentdescription  \
0        20003.0  Lonsdale Street between Exhibition Street and ...   
1        20003.0  Lonsdale Street between Exhibition Street and ...   
2        20003.0  Lonsdale Street between Exhibition Street and ...   
3        20003.0  Lonsdale Street between Exhibition Street and ...   
4        20003.0  Lonsdale Street between Exhibition Street and ...   

              description1       description2 description3 description4  \
0  2P MTR M-SAT 7:30-18:30  2P SUN 7:30-18:30          NaN          NaN   
1                      NaN                NaN          NaN          NaN   
2                      NaN                NaN          NaN          NaN   
3  1P MTR M-SAT 7:30-18:30  1P SUN 7:30-18:30          NaN          NaN   
4  2P MTR M-SAT 7:30-18:30  2P SUN 7:30-18:30          NaN          NaN   

  description5  
0          NaN  
1          NaN  
2          NaN  
3          NaN  
4          NaN  

Time limit to check how many parking spaces are empty.

In [55]:
# Columns to check for NaN values
columns_to_check = ['description1', 'description2', 'description3', 'description4', 'description5']

# Find rows where all specified columns are NaN
all_nan_rows = merged_df[columns_to_check].isna().all(axis = 1)

# Count the number of such rows
num_all_nan_rows = all_nan_rows.sum()

# Display the count
print(f"Number of rows where all of the specified columns are NaN: {num_all_nan_rows}")
Number of rows where all of the specified columns are NaN: 5481

Conclusion¶

  • We have integrated bay_id and parking space restrictions for each parking space. The existence number is unchanged.

  • Through inspection, it can be found that the time limits of the parking spaces where most parking space sensors are located have not been integrated into the database. The lack of parking spaces is inconsistent with the reality.

  • Three different systems or departments provide the three databases of parking space sensors, parking spaces, and parking restrictions. Missing data can be encountered. We will assume parking time restrictions.

Today's parking space analysis¶

Due to the no parking space time limit, we will assume that the parking space time limit is 2 hours. The average parking time is 1 hour. Because the parking space sensor provides a snapshot, the sensor only records the most recent state. Pushing the collected empty parking spaces forward one hour is the assumed parking start time. Can analyze the theoretical last parking peak period.

Time processing¶

In [56]:
# Step 1: Create a copy of today's data
todays_data_copy = todays_data.copy()

# Step 2: Filter to only include rows where the parking space is vacant
vacant_indices = todays_data_copy[todays_data_copy['status_description'] == 'Unoccupied'].index

# Step 3: Subtract 60 minutes from the timestamp of these vacant spaces
todays_data_copy.loc[vacant_indices, 'status_timestamp'] = todays_data_copy.loc[
    vacant_indices, 'status_timestamp'] - Minute(60)

# Step 4: Update the 'status_date' column based on the new 'status_timestamp'
todays_data_copy['status_date'] = todays_data_copy['status_timestamp'].dt.date

# Step 4: Change the status of these advanced-time vacant spaces to 'Occupied'
todays_data_copy.loc[vacant_indices, 'status_description'] = 'Occupied'

# Step 5: Remove rows where the 'status_date' is not today
todays_data_copy = todays_data_copy[todays_data_copy['status_date'] == today_melbourne]

Popular parking hours¶

In [57]:
# Convert 'status_timestamp' to just the hour of the day
todays_data_copy['status_hour'] = todays_data_copy['status_timestamp'].dt.hour

# Create a frequency table for the update times
frequency_table = todays_data_copy['status_hour'].value_counts().sort_index()

# Plot a histogram for the update times
plt.figure(figsize = (12, 6))
plt.bar(frequency_table.index, frequency_table.values, color = 'blue', alpha = 0.7, label = 'Frequency')
plt.xlabel('Hour of the Day')
plt.ylabel('Frequency')
plt.title('Temporal Distribution of Parking Space Occupancy')
plt.xticks(range(0, 24))
plt.grid(True)
plt.legend()
plt.show()
No description has been provided for this image

If time is even, it can be considered that the parking peak is still in progress. If it has started to decline, it can be considered that the previous peak has passed.

Parking hotspot visualization¶

Today's parking spaces are clustered and marked on the map.

In [58]:
# Create a new DataFrame with only the latitude and longitude columns
X = todays_data_copy[['latitude', 'longitude']].values

# Run DBSCAN clustering
dbscan = DBSCAN(eps = 0.0005, min_samples = 20)
labels = dbscan.fit_predict(X)

# Add cluster labels back to DataFrame
todays_data_copy['cluster_label'] = labels

# Calculate the centroid of each cluster
cluster_centroids = todays_data_copy.groupby('cluster_label').agg(
    {'latitude': 'mean', 'longitude': 'mean'}).reset_index()

# Initialize the map
m = folium.Map(location = [-37.814, 144.963], zoom_start = 16)

# Add circles for each cluster
for idx, row in cluster_centroids.iterrows():
    if row['cluster_label'] != -1:  # -1 is the label for noise points in DBSCAN
        folium.Circle(
            location = [row['latitude'], row['longitude']],
            radius = 50,  # Radius
            color = 'blue',
            fill = True,
            fill_color = 'blue'
        ).add_to(m)

# Show the map
m
Out[58]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Conclusion¶

Multiple analysis methods have been tried so far. Try combining multiple APIs. But, due to the lack of parking space time limits and the snapshot nature of the sensors. We can only analyze theoretical parking hotspots. There is a deviation from the actual situation.

If we need to analyze the possibility of a long time, we need to deploy the server in real time for a long time. It is difficult for ordinary readers to reproduce and operate.

This analysis is intended for data exploration. At the same time, it provides analysis reference for data that cannot be updated and integrated. Guide readers to realize the importance of data screening and processing. Presenting unprocessed data requires repeated analysis and screening. Provide a reference for data processing ideas.